Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction

نویسنده

Roberto Navigli

چکیده

Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issue. However, Web clustering methods typically rely on some notion of textual similarity of search results. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning. In this talk, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of a query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines in the clustering and diversification of search results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) i...

متن کامل

Clustering Web Search Results with Maximum Spanning Trees

We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves c...

متن کامل

Multilingual Word Sense Induction to Improve Web Search Result Clustering

In [12] a novel approach to Web search result clustering based on Word Sense Induction, i.e. the automatic discovery of word senses from raw text was presented; key to the proposed approach is the idea of, first, automatically inducing senses for the target query and, second, clustering the search results based on their semantic similarity to the word senses induced. In [1] we proposed an innov...

متن کامل

UBC-AS: A Graph Based Unsupervised System for Induction and Classification

This paper describes a graph-based unsupervised system for induction and classification. The system performs a two stage graph based clustering where a cooccurrence graph is first clustered to compute similarities against contexts. The context similarity matrix is pruned and the resulting associated graph is clustered again by means of a random-walk type algorithm. The system relies on a set of...

متن کامل

WoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification

In this demonstration we present WoSIT, an API for Word Sense Induction (WSI) algorithms. The toolkit provides implementations of existing graph-based WSI algorithms, but can also be extended with new algorithms. The main mission of WoSIT is to provide a framework for the extrinsic evaluation of WSI algorithms, also within end-user applications such as Web search result clustering and diversifi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction

نویسنده

چکیده

منابع مشابه

Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction

Clustering Web Search Results with Maximum Spanning Trees

Multilingual Word Sense Induction to Improve Web Search Result Clustering

UBC-AS: A Graph Based Unsupervised System for Induction and Classification

WoSIT: A Word Sense Induction Toolkit for Search Result Clustering and Diversification

عنوان ژورنال:

اشتراک گذاری